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ABSTRACT 

We focus on the problem of adding fault-tolerance to an ex¬ 
isting concurrent protocol in the presence of unchangeable 
environment actions. Such unchangeable actions occur in 
practice due to several reasons. One instance includes the 
case where only a subset of the components/processes can be 
revised and other components/processes must be as is. An¬ 
other instance includes cyber-physical systems where revis¬ 
ing physical components may be undesirable or impossible. 
These actions differ from faults in that they are simultane¬ 
ously assistive and disruptive, whereas faults are only dis¬ 
ruptive. For example, if these actions are a part of a physical 
component, their execution is essential for the normal oper¬ 
ation of the system. However, they can potentially disrupt 
actions taken by other components for dealing with faults. 
Also, one can typically assume that fault actions will stop 
for a long enough time for the program to make progress. 
Such an assumption is impossible in this context. 

We present algorithms for adding stabilizing fault-tolerance, 
failsafe fault-tolerance and masking fault-tolerance. Inter¬ 
estingly, we observe that the previous approaches for adding 
stabilizing fault-tolerance and masking fault-tolerance can¬ 
not be easily extended in this context. However, we find 
that the overall complexity of adding these levels of fault- 
tolerance remains in P (in the state space of the program). 
We also demonstrate that our algorithms are sound and com¬ 
plete. 
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1. INTRODUCTION 

In this paper, we focus on the problem of model repair 
for the purpose of making the model stabilizing or fault- 
tolerant. Model repair is the problem of revising an existing 
model/program so that it satisfies new properties while pre¬ 
serving existing properties. It is desirable in several contexts 


Permission to make digital or hard copies of all or part of this work for 
personal or classroom use is granted without fee provided that copies are 
not made or distributed for profit or commercial advantage and that copies 
bear this notice and the full citation on the first page. To copy otherwise, to 
republish, to post on servers or to redistribute to lists, requires prior specific 
permission and/or a fee. 

Copyright 20XX ACM X-XXXXX-XX-X/XX/XX ...$15.00. 


Sandeep Kulkarni 
Computer Science and 
Engineering Department 
Michigan State University 
East Lansing, Michigan 
48824, USA 

sandeep@cse.msu.edu 

such as when an existing program needs to be deployed in 
a new setting or to repair bugs. Model repair for fault- 
tolerance enables one to separate the fault-tolerance and 
functionality so that the designer can focus on the func¬ 
tionality of the program and utilize automated techniques 
for adding fault-tolerance. It can also be used to add fault- 
tolerance to a newly discovered fault. 

This paper focuses on performing such repair when some 
actions cannot be removed from the model. We refer to such 
transitions as unchangeable environment actions. There are 
several possible reasons that actions can be unchangeable. 
Examples include scenarios where the system consists of sev¬ 
eral components -some of which are developed in house and 
can be repaired and some of which are third-party and can¬ 
not be changed. They are also useful in systems such as 
Cyber-Physical Systems (CPSs) where modifying physical 
components may be very expensive or even impossible. 

The environment actions differ from fault actions consid¬ 
ered in [3]- Fault actions are assumed to be temporary in 
nature, and all the previously proposed algorithms to add 
fault-tolerance in [5], work only with this important assump¬ 
tion that faults finally stop occurring. However, unlike fault 
actions, environment actions can keep occurring. Environ¬ 
ment actions also differ from adversary actions considered 
in [I] or in the context of security intrusions. In particu¬ 
lar, the adversary intends to cause harm to the system. By 
contrast, environment actions can be collaborative as well. 
In other words, the environment actions are simultaneously 
collaborative and disruptive. The goal of this work is to 
identify whether it is possible for the program to be repaired 
so that it can utilize the assistance provided by them while 
overcoming their disruption. To give an intuition of the role 
of the environment and the difference between program, en¬ 
vironment, and fault actions, next, we present the following 
example. 

An intuitive example to illustrate the role of envi¬ 
ronment. This intuitive example is motivated by a sim¬ 
ple pressure cooker (see Figure [p. The environment (heat 
source) causes the pressure to increase. In the subsequent 
discussion, we analyze this pressure cooker when the heat 
source is always on. There are two mechanisms to decrease 
the pressure, a vent and an overpressure valve. For sake 
of presentation, assume that pressure is below 4 in normal 
states. If the pressure increases to 4 or 5, the vent mecha¬ 
nism reduces the pressure by 1 in each step. However, the 
vent may fail (e.g., if something gets stuck at the vent pipe), 
and its pressure reduction mechanism becomes disabled. If 
the pressure reaches 6, the overpressure valve mechanism 
causes the valve to open resulting in an immediate drop in 



Figure 1: An intuitive example to illustrate the role 
of environment actions. For sake of readability, fault 
actions (e.g. actions to /S 4 ) are removed from the 
diagram. 


pressure to be less than 4. We denote the state where pres¬ 
sure is a by s a when the vent is working, and by state fs a 
when the vent has failed. 

Our goal in the subsequent discussion is to model the pres¬ 
sure cooker as a program and identify an approach for the 
role of the environment and its interaction with the program 
so that we can conclude this requirement: starting from any 
state identified above, the system reaches a state where the 
pressure is less than 4. 

Next, we argue that the role of the environment differs 
from that of fault actions and program actions. In turn, 
this prevents us from using existing approaches such as [5]. 
Specifically, 

• Treating the environment as a fault does not work. 
In particular, if we treat the environment as a fault 
then the transitions from state fs 4 to fss and from 
fss to fse in Figure [1] are not required to occur. If 
these actions do not occur, the overpressure valve is 
never be activated. Hence, neither the valve nor the 
vent mechanism reduces the pressure to be less than 
4. Also, faults are expected to stop. By contrast, this 
is not the case with the environment actions. 

• Treating the environment transitions similar to pro¬ 
gram transitions is also not acceptable. To illustrate 
this, consider the case where we want to make changes 
to the program in Figure [T] For instance, if the over¬ 
pressure valve is removed, then this would correspond 
to removing transition from S 6 (respectively fse) to 
where pressure is less than 4. Also, if we add another 
safety mechanism, it would correspond to adding new 
transitions. However, we cannot do the same with 
environment actions that capture the changes made 
by the heat source. For example, we cannot add new 
transitions (e.g., from fs 4 to S4) to the environment, 
and we cannot remove transitions (e.g., from S 4 to 
S5). In other words, even if we make any changes to 
the model in Figure |T] by adding or removing safety 
mechanisms, the transitions marked environment ac¬ 
tions remain unchanged. We cannot introduce new 
environment transitions and we cannot remove exist¬ 
ing environment transitions. This is what we mean by 
environment being unchangeable. 


• Treating the environment to be collaborative without 
some special fairness to the program does not work 
either. In particular, without some special fairness 
for the program, the system can cycle through states 
S 4 , S5, S4, ss ■ ■ ■. 

• Treating the environment to be simultaneously collab¬ 
orative as well as adversarial where the program has 
some special fairness enables one to ensure that this 
program achieves its desired goals. In particular, we 
need the environment to be collaborative, i.e., if it 
reaches a state where only environment actions can 
execute then one of them does execute. (Note that 
this requirement cannot be expected of faults.) This 
is necessary to ensure that system can transition from 
state fs 4 to fss and from fss to fse which is essential 
for recovery to a state where pressure is less than 4. 
We also need the program to have special fairness to 
require that it executes faster than the environment 
so that it does not execute in a cycle through states 
S4, ss, S4, ■ ■ ■. (We will precisely define the notion of 
faster in Section EO) 

Goal of the paper. Based on the above example, 
our goal in this paper is to evaluate how such simultane¬ 
ously collaborative and adversarial environment can be used 
in adding stabilization, failsafe fault-tolerance, and masking 
fault-tolerance to a given program. 

Intuitively, in stabilizing fault-tolerance, starting from an 
arbitrary state, the program is guaranteed to recover to its 
legitimate states. In failsafe fault-tolerance, in the presence 
of faults, the program satisfies the safety specification. In 
masking fault-tolerance, in addition to satisfying the safety 
specification, the program recovers to its legitimate states 
from where future specification is satisfied. Also, the results 
from this work are applicable for nonmasking fault-tolerance 
from [ 2 ]. 

We also note that the results in 0 do not model envi¬ 
ronment actions. Using the framework in [5] for the above 
example would require one to treat the environment actions 
to be fault actions. And, as discussed above, this leads to 
an unacceptable result. 

Contributions of the paper. The main results of 
this work are as follows: 

• We present two algorithms for addition of stabilization 
to an existing program. Of these, the first algorithm 
is designed for the case where the program is provided 
with minimal fairness (where the program is given a 
chance to execute at least once between any two en¬ 
vironment actions). The second algorithm, proposed 
in the Appendix, is for the case where additional fair¬ 
ness is provided. This algorithm is especially appli¬ 
cable when adding stabilization with minimal fairness 
is impossible. Both these algorithms are sound and 
complete, i.e., the program found by them is guaran¬ 
teed to be stabilizing and if they declare failure then 
it implies that adding stabilization to that program is 
impossible. 

• We present an algorithm for addition of failsafe fault- 
tolerance. This algorithm is also sound and complete. 

• We present an algorithm for addition of masking fault- 
tolerance. This algorithm is also sound and complete. 

• We note that the algorithm for masking fault-tolerance 
can be easily applied for designing nonmasking fault- 
tolerance discussed in [ 2 ]. 










• We show that the complexity of all algorithms pre¬ 
sented in this paper is polynomial (in the state space 
of the program). Also, we note that the algorithms 
for stabilizing and masking fault-tolerance require one 
to solve the problem in a completely different fash¬ 
ion when compared to the case where we have no un¬ 
changeable environment actions. 

Organization of the paper. This paper is organized 
as follows: in Section [2] we provide the definitions of a pro¬ 
gram design, specifications, faults, fault-tolerance, and safe 
stabilization. In Section [3] we define the problem of adding 
safe stabilization, and propose an algorithms to solve that 
problem for the case of minimal fairness. (The algorithm for 
the case where additional fairness is provided is proposed in 
the Appendix.) In Section |4j as a case study, we illustrate 
how adding stabilization algorithm can be used for the con¬ 
troller of a smart grid. In Section[5]we define the problem of 
adding fault-tolerance, and propose two algorithms to add 
failsafe and masking fault-tolerance. In Section [6] we show 
how our proposed algorithms can be extended to solve re¬ 
lated problems. In Section [7] we discuss related work. In 
section[8l we discuss application of our algorithms for cyber¬ 
physical and distributed systems. Finally, we make conclud¬ 
ing remarks in Section [9] 

2. PRELIMINARIES 


Note that the above definition requires that in every step, 
either a program transition or an environment transition is 
executed. Moreover, after the environment transition exe¬ 
cutes, the program is given a chance to execute in the next 
k— 1 steps. However, in any state that no program transition 
is available, an environment transition can execute. 

Definition 6 (Closure). A state predicate S is closed 
in a set of transitions 5 iff (V(so,si) : (so,Si) £ 5 : (so £ 
S=>si £ S)). 

2.2 Specification 

Following Alpern and Schneider [7], we let the specifica¬ 
tion of program to consist of a safety specification and a 
liveness specification. 

Definition 7 (Safety). The safety specification is spec¬ 
ified in terms of a set of transitions, 5b, that the program is 
not allowed to execute. Thus, a sequence a = (so,si,...) 
refines the safety specification 5b iff Vj : 0 < j < length(a ) : 
(sj,Sj+ i) ^ 5b- 

Definition 8 (Liveness). The liveness specification is 
specified in terms of a leads-to property (L T) to denote, 
where both L and T are state predicates. Thus, a sequence 
a = {so, si,...} refines the liveness specification iffVj : L is 
true in Sj : {3k : j < k < length(cr) : T is true in sf). 


In this section, we define the notion of programs, faults, 
specification and fault-tolerance. We define programs in 
terms of their states and transitions. The definitions of spec¬ 
ification is based on that by Alpern and Schneider [Tj. And, 
the definitions of faults and fault-tolerance are adapted from 
that by Arora and Gouda [2]. 

2.1 Program Design Model 

Definition 1 (Program). A program p is of the form 
{Sp,5 p ) where S p is the state space of program p, and 5 P C 
Sp x S p . 

The environment in which the program executes also changes 
the state of the program. Instead of modeling this in terms 
of concepts such as variables that are written by program 
and variables that are written by the environment, we use a 
more general approach where models it as a subset of S p xS p . 
Thus, 

Definition 2 (Environment). An environment 5 e for 
program p, is defined as a subset of S p x S p . 

Definition 3 (State Predicate). A state predicate of 
p is any subset of S p . 

Definition 4 (Projection). The projection of program 
p on state predicate S, denoted as p|5, is the program {S p , {(so, si) 
(so,si) £ 5 P A so, si £ S'}}. In other words, p|S consists of 
transitions of p that start in S and end in S. We denote the 
set of transitions of p\S by <5 P |S. 

Definition 5 (p[]fc<5 e computation). Let p be a program 

with state space S p and transitions 5 P . Let 5 e be an environ¬ 
ment for program p and k be an integer greater than 1. We 
say that a sequence (so, si, S 2 ,...) is ap[]fc<5 e computation iff 

• Vi : i > 0 : Si £ S p , and 

• Vi : i > 0 : (si, Si+i) £ 5„ U 5 e , and 

• Vi : i > 0 : ((si, Si+i) £ 5 e ) => 

(Vi : i < l < i + k : (3sJ :: (s ; , s[) £ 5 P ) (s ; , s i+ i) £ 5 P )). 


Definition 9. A specification, is a tuple {Sf,Lv), where 
Sf is a safety specialization and Lv is a liveness specifica¬ 
tion. A sequence a satismJ{es spec iff it refines Sf and Lv. 

Definition 10 (Refines). p \\ k 5 e refines spec from S 
iff the following conditions hold: 

• S is closed in 5 P U 5 e , and 

• Every computation of p [\ k 5 e that starts from a state in S 
refines spec. 

We note that from the above definition, it follows that start¬ 
ing from a state in S, execution of either a program action 
or an environment action results in a state in S. Transitions 
that start from a state in S and reach a state outside S will 
be modeled as faults (cf. Definition 1121) . 

Definition 11 (Invariant). If p refines spec from S 
and S ^ cj>, we say that S is an invariant of p for spec. 

2.3 Faults and Fault-Tolerance 


Definition 12 (Faults). A fault for p{= {S P ,5 P )) is a 
subset of S p x S p . 


Definition 13 {p\\k5 e \\f computation). Letpbeapro- 

gram with state space S p and transitions 5 P . Let 5 e be an en¬ 
vironment for program p, k be an integer greater than 1, and 
f be the set of faults for program p. We say that a sequence 


(so, si, S 2 ,...} is a p[]k5 e []f computation iff 


Vi : i > 0 : Si £ S p , and 


Vi : i > 0 : (si, Si+i) £ 5„ U 5 e U / , and 
Vi : i > 0 : (si, Si+i) £ <5e => 


Vi 


< l < i + k : (3s; :: (s;,S;) £ 5 P =V (sz,sz_|_i) 


(4j U /)), and 

• 3n : n > 0 : (Vj : j > n : £ ( 5 P U 5 e )). 


£ 


The definition of fault-span captures the boundary up to 
which program could be perturbed by faults. Thus, 

Definition 14 (Fault-span). T is an f-span ofp[]k5 e 
from S iff 


• S => T, and 

• for every computation (so, si, S 2 , • ■ •) of p\\kS e []f, where 
s 0 £ S, Mi : Si £ T. 

A failsafe fault-tolerant program ensures that safety prop¬ 
erty is not violated even if faults occur. In other words, we 
have 

Definition 15 (failsafe f-tolerant). p[]k5 e is fail¬ 
safe f-tolerant to spec (=(Sf, Lv)) from S iff the following 
two conditions hold: 

• p[]fc<5e refines spec from S, and 

• every computation prefix of p[]fc<5 e []/ that starts from S 
refines Sf. 

In addition to satisfying the safety property, a masking 
fault-tolerant program recovers to its invariant. 

Definition 16 (masking f-tolerant). p is masking 
f-tolerant to spec from S iff the following two conditions 
hold: 

• p[]ifc<5e is failsafe f-tolerant to spec, and 

• there exists T such that (1) T is an f-span of p\\k5 e from 
S and (2) for every computation a(= (so, si, S 2 , ■ ■ ■)) of 
PtU'M]/ starts from a state in S if there exists i > 0 
such that Si G T—S, then there exists j>i such that Sj G S. 

Condition (2) above simply means that in any computa¬ 
tion which starts in S, when the program leaves S, it should 
return back to S. 

We also define the notion of stabilizing programs. We 
extend the definition from m and m by requiring a sta¬ 
bilizing program to satisfy certain safety property during 
recovery. We consider this generalized notion because it al¬ 
lows us to capture program restrictions (such as inability to 
change environment variables) and because it is useful in our 
design of algorithm for adding masking fault-tolerance. The 
traditional definition of stabilization is obtained by setting 
5b in the following definition to be the empty set. 

Definition 17 (Safe Stabilization). p[]*,<5 e isSb-safe 
stabilizing for invariant S iff following conditions hold: 

• S is closed in 5 P U S e , and 

• for any p[\k5e computation (so, si, S 2 ,...) there does not 
exist l such that (si,si+ 1 ) G 8b, and 

• for anyp\\k8e computation (so, si, S2, ■ ■■) there exists l such 
that si G S. 

Remark 1. The notion of safe stabilization has been viewed 
from different angles in the literature. In authors con¬ 
sider the case where the program reaches an acceptable states 
quickly and converges to legitimate states after a longer time. 
By contrast, our notion simply requires that certain transi¬ 
tions (that violate safety specification) cannot be executed 
during recovery. 

3. ADDITION OF SAFE STABILIZATION 

In this section, we present our algorithm for adding safe 
stabilization to an existing program. In Section l3Tl we iden¬ 
tify the problem statement. In Section [3.21 we present our 
algorithm for the case where the parameter k (that identi¬ 
fies the fairness between program and environment acitons) 
is set to 2. Due to reasons of space, the algorithm for arbi¬ 
trary value of k is presented in the Appendix. 


3.1 Problem Definition 

The problem for adding safe stabilization begins with a 
program p, its invariant S, and a safety specification 5b that 
identifies the set of bad transitions. The goal is to add sta¬ 
bilization so that starting from an arbitrary state, the pro¬ 
gram recovers to S. Moreover, we want to ensure that dur¬ 
ing recovery the program does not execute any transition in 
5b- Also, we want to make sure that the execution of en¬ 
vironment actions cannot prevent recovery to S. Thus, the 
problem statement is as follows: 

Given program p with state space S p and transitions 
5 P , state predicate S, set of bad transitions 6b, envi¬ 
ronment 5 e , and k > 1, identify p' with state space S p 
such that: 

• p'\S =p\S 

• p'\\kS e is ^6-safe stabilizing for invariant S 

3.2 Addition of Safe Stabilization 

In this section, we present an algorithm for the problem 
of addition of stabilization defined in the Section 1X11 The 
algorithm proposed here adds stabilization for k = 2. When 
k = 2, the environment transition can execute immediately 
after any program transition. By contrast, for larger k, the 
environment transitions may have to wait until the program 
has executed k— 1 transitions. Observe that if 5b H 8 e is 
nonempty then adding stabilization is impossible. This is 
due to the fact that if the program starts in a state where 
such a transition can execute then it can immediately violate 
safety. Hence, this algorithm (but not the algorithms for 
adding failsafe and masking fault-tolerance) assumes that 
8b Gl 8e — 0* 

The algorithm for adding stabilization is as shown in Al¬ 
gorithm [T| In this algorithm, 5 P is the set of transitions of 
the final stabilizing program. Inside the invariant, the tran¬ 
sitions should be equal to the original program. Therefore, 
in the first line, we set 5 p to 5 P \S. State predicate R is the 
set of states such that every computation starting from R 
has a state in S. Initially (Line [2]) R is initialized to S. In 
each iteration, state predicate R p is the set of states that 
can reach a state in R using a safe program transition, i.e., 
a transition not in 8 b- In Line [7] we add such program tran¬ 
sitions to Sp. 

In the loop on Lines mm we add more states to R. We 
add so to R (Line UOl) . whenever every computation starting 
from so has a state in S. A state so can be added to R only 
when there is no environment transition starting from so and 
going to state outside R U Rp. In addition to this condition, 
there should be at least one transition from so that reaches 
R. The loop on Lines mm terminates if no state is added 
to R in the last iteration. Upon termination of the loop, 
the algorithm declares failure to add stabilization if there 
exists a state outside R. Otherwise, it returns 8 P as the set 
of transitions of the stabilizing program. 

We use Figure[2]to illustrate Algorithm[T] Figure[2]depicts 
the status of the state space in a hypothetical i th iteration of 
loop on Lines I3l fl2l In this iteration state A is added to R. 
This is due to the fact that (1) there is at least one transition 
from A (namely (A,F)) that reaches R and (2) there is no 
environment transition from A that reaches outside R U R p . 
Likewise, state C is also added to R. State B is not added 
to R due to environment transition (B.E). Likewise, state 
D is also not added to R. State E is not added to R since 
there is no transition from E to a state in R. 

In the next, i.e., ( i + l) th , iteration, E is added to R since 







Figure 2: Illustration of how R expands in Algo¬ 
rithm m 


there is a transition ( E , A) and A was added to R in the 
i th iteration. Continuing this, D is added in the (i + 2) th 
iteration. 


Algorithm 1 Addition of safe stabilization 
Input: Sp,5 p ,5 e , S, and 5 b 
Output: 5'p or Not-Possilbe 

1 : 5 ; := (5p\sy, 

2: R = S; 

3: repeat 
4: R' = R; 

5: Rp = {so|«o ^ R A 3si : si £ R : (s 0 , s i) ^ 5 b }-, 

6 : for each so G Rp do 

7: 5p = 5'p U {(so, si)|(s 0 , si) ^ 5 b A si e R}; 

8 : end for 

9: for each so £ R \ $S 2 G ~'(R U R p ) : (so, S 2 ) G 5 e A 

(3si : Si G {R U R p ) : (so, si) G 5 e V So G Rp') do 

10: R = R U so; 

11 : end for 

12: until (R' = R); 

13: if 3so ^ R then 

14: return ’Not-Possible’; 

15: else 

16: return 5' p ; 

17: end if 


Theorem 1. Algorithm Q] is sound and complete. And, 
its complexity is polynomial. 

For reasons of space, we provide the proofs in Appendix. 

4. CASE STUDY: STABILIZATION OF SMART 
GRID 

In this section we illustrate how Algorithm [T] is used to 
add safe stabilization to a controller program of a smart 
grid. We consider an abstract version of the smart grid 
described in [18] (see Figure [3]). In this example, the system 
consists of a generator G and two loads Z\ and Z 2 . There 


are three sensors in the system. Sensor G shows the power 
generated by the generator, and sensors 1 and 2 show the 
demand of load Z\ and Z 2 , respectively. The goal is to 
ensure that proper load shading is used if the load is too 
high (respectively, generating capacity is too low). 



Figure 3: Elementary single generator smart grid 
system 


The control center is shown by a dashed circle in Figure [3] 
It can read the values of the sensors and turn on/off switches 
connected to the loads. The program of the control center 
should control switches in a manner that all the conditions 
below are satisfied: 

1. Both switches should be turned on if the overall sensed 
load is less than or equal to the generation capacity. 

2. If sensor values reveal that neither load can individu¬ 
ally be served by G then both are shed. 

3. If only one load can be served then the smaller load is 
shed assuming the larger load can be served by G. 

4. If only one can be served and the larger load exceeds 
the generation capacity, the smaller load is served. 

4.1 Program Model 

We model the program of the smart grid shown in Fig¬ 
ure. [3] by program p which has five variables as follows: 

Vg '■ The value of sensor G. 

Vi : The value of sensor 1. 

V 2 : The value of sensor 2. 
wi : The status of switch 1. 

W 2 : The status of switch 2. 

The value of each sensor is an integer in the range [0, max]. 
And, the status of each switch is a Boolean. 

The invariant S for this program includes all the states 
which are legitimate according to the conditions 1-4 men¬ 
tioned above. Therefore, S is the union of state predicates 
1 1 to Is as follows [3 


x We need to add 0 < Vi, V 2 , V g < max to all conditions. For 
brevity, we keep these implicit. 




































h = (Hi + Vi < Vg) A (w i A io 2 ) 

1 2 = Hl < Vg A V 2 > Vg) A ( w\ A —*W 2 ) 

1 3 = (Vi > Vq A V2 < Vg) a (-uni a 102 ) 

1 4 = (Vi > Vg A V 2 > Vg) a (-'■loi A - 1 U 12 ) 

h = (Vi + V 2 > Vg A Vi < V G A V 2 < V G A Vi < V 2 ) A 
(-■lUl A W 2 ) 

h = (Vi + V 2 > V G A Vi < V G a H 2 < Vg A Vi > V 2 ) A 
(tui A -'io 2 ) 

Observation 1. For any value of Vi, H 2 , and Vg, there 
exists an assignmet to wi and W 2 such that the resulting 
state is in S. 

The values of sensors can change by environment transi¬ 
tions. In addition, environment can keep the current value 
of a sensor by self-loop environment transitions. However, 
environment cannot change the status of switches. Thus, set 
of environment transitions, 5 e is equal to {(so, si)| (wi(so) = 
wi(si)) A (u> 2 (s 0 ) = w 2 (si))}, where Wi(sj) shows the status 
of the switch i in state Sj. 

Program cannot change the value of any sensor. Thus, set 
of bad transitions, 5b for this program is equal to {(so, si)| Hg(so) 
Vg(si) V Vi(so) f Vi(si) V H 2 (s 0 ) f H 2 (si)}, where H(s;) 
shows the value of the variable Vi in state Si. 

For the sake of presentation and to illustrate the role of 
k , we also assume that program cannot change the status of 
more than one switch in one transition. For this case, we 
add more transitions to the set of bad transitions. We call 
the set of bad transitions for this case Sb 2 and it is equal 
to {(s 0 ,si)| Vg(so) f Vg(si) V Vi (so) f Hl(si) V F 2 (so) f 
H 2 (si) V (wi(so) f wi(si) A w 2 (s 0 ) f w 2 (si))}. 

4.2 Adding Stabilization 

Here, we apply Algorithm [T| to add stabilization to pro¬ 
gram p defined in Section 14.11 We illustrate the result of 
applying Algorithm [T| for two sets of bad transitions, 5b and 
Sb 2 . 


predicate R p is the union of state predicates R P1 to R pe as 
follows (© denotes the xor operation): 


R P2 = (Vi < Vg A H 2 > Vg) A (wi © -ua 2 ) 

R P3 = (Hl > Vq A H 2 < Vg) A (~'Wi © w 2 ) 

Rp 4 = (Hi > Vg A H 2 > Vg) A (-•wi © -,ic 2 ) 

R P 5 = (Hi +H 2 > VgAHl < H G AV 2 < HcAVi < H 2 )a(-,uii©id 2 ) 
Rpq = (Hi +H 2 > H3AH1 < HgAH 2 < HgAHl > H 2 )A(iDiffi-ruj 2 ) 

Similarly, -i(7? U R p ) includes every state that is outside 
S and more than one step is needed to reach a state in S. 
Therefore, state predicate -1 (R U R p ) is the union of state 
predicates R P1 to R P6 as follows: 

R p 1 = (Hi + Hl < Vg) A (-,101 A -,ui 2 ) 

R'p 2 = (Hi < Vg A H 2 > Vg) A (-itoi A ui 2 ) 

R ' P 3 = (Hi > Vq A H 2 < Vg) A (w 1 A -auf) 

Rpi = (Hi > Vg A H 2 > Vg) A (wi A tu 2 ) 

Rps ~ (Hi +H 2 > HgAHl < H G AH 2 < HgAHl ^ V2)a(uji A—'ic 2 ) 
_R' P& = (Hl+H 2 > HgAHi < HgAH 2 < HgAHl > H 2 )a(-,idiAid 2 ) 

Now, observe that for any status of switches, there exists 
a state in ~<{R U R P )- That means from any state in S p it is 
possible to reach a state in -i(7?Uf? p ) without changing the 
value of switches using an environment transition. There¬ 
fore, no state is added to R in the first iteration, and loop 
on Lines l3iT2l terminates in the first iteration. Since, all the 
states outside S remains in —,72, the algorithm declares no 
solution to the addition problem exists. Therefore, accord¬ 
ing to the completeness of the Algorithm [T] there does not 
exist any <5(, 2 -safe stabilizing program for the smart grid de¬ 
scribed in this section when k is equal to 2. This is expected 
since the only solution for this problem requires changing 
both sensors simultaneously before the environment is able 
to disrupt it again. This program does have a solution for 
k = 3. But we omit its derivation for lack of space. 


4.2.1 Adding Stabilization for 5b 

At the beginning of Algorithm [T] R is initialized with S. 
In the first iteration of loop on Lines mm R p is the set of 
states outside S that can reach a state in S with only one 
program transition. A program transition cannot change 
the value of any sensor. 

According to Observation [T] from each state in - 1 S it is 
possible to reach a state in S with changing the status of 
switches. Therefore, following set of transitions are added 
to 5 p by Line [3 

{(so,si)| Hi(s 0 ) = Hi(si) A H 2 (s 0 ) = H 2 (si) A H G (s 0 ) = 
Hg(si) A s 0 i U? =1 /i A si £ Uf = i If 

Since every state in - 1 S {~>R) is in R p , there does not exist 
any environment transition starting from any state to a state 
in ~<(R U Rp). Therefore, all the states in ->R are added to 
R by Line 1101 

In the second iteration no more states are added to R. 
Thus, loop on Line 131121 terminates. Since there is no state 
in - 1 R, the algorithm returns 5 p as the transition of the re¬ 
sulting 5(,-safe stabilizing program for S. 

4.2.2 Adding Stabilization for 5b 2 

At the beginning of Algorithm Q] R is initialized with S. 
In the first iteration of loop on Lines mm R p is the set of 
states outside S that can reach a state in S with only one 
program transition. A program transition cannot change 
the value of any sensor. In addition, according to 5b 2 , it 
cannot change the status of both switches. Therefore, state 


5. ADDITION OF FAULT-TOLERANCE 

In this section, we present our algorithm for adding failsafe 
and masking fault-tolerance. In Section [5T] we identify the 
problem statement for adding these levels of fault-tolerance. 
In Section E21 we present our algorithm for adding failsafe 
fault-tolerance. Section [A3] presents an algorithm for adding 
masking fault-tolerance. Finally, we show that the same 
algorithm can be used for adding nonmasking fault-tolerance 
considered in [2]. 

5.1 Problem Definition 

In addition to the set of bad transitions 5b that we used 
for providing safe stabilization, in this case, we introduce ad¬ 
ditional parameter 5 r that identifies additional restrictions 
on program transitions. As an example, consider the case 
where a program cannot change the value of sensor, i.e., it 
can only read it. However, the environment can change the 
value of the sensor. In this case, transitions that change 
the value of the sensor are disallowed as program transi¬ 
tions but, they are acceptable as environment transitions. 
Note that this was not necessary in Section [3] since we could 
simply add these transitions to 5b, i.e., transitions that vi¬ 
olate safety. This is acceptable since addition stabilization 
requires 5b f~\ 5 e = cj>. However, adding failsafe or masking 
fault-tolerance is possible even if <5b n <5 e f cf>. Hence, we 
add the parameter 5 r explicitly. The problem statement for 
addition of fault-tolerance is as follows: 



Given p, S e , S, spec, set of program restrictions 8 r , 
k > 1, and / such that p[]fc<S e refines spec from S, and 
5 P D S r = tj>, identify p' and S' such that: 

• Cl : every computation of p'\\k8 e that starts in a 
state in S 1 is a computation of p[]fc<5 e that starts in 
S, and 

• C2 : j/[]fc<5 e is failsafe (respectively, masking) /- 

tolerant to spec from S' and 

• C3 : S' p n<5 r = 0 

The problem statement requires that the program does 
not introduce new behaviors in the absence of faults (Con¬ 
straint Cl), provides desired fault-tolerance (Constraint C 2), 
and does not include a transition in 5 r (Constraint C3). 

Assumption 1. For simplicity of the algorithms and its 
proof, we assume that there are no deadlocks in 5 p []5 e in any 
state in S. In other words, for any so in S, there exists a 
state si in S such that (so, si) is in 5 P U 5 e . If this is not 
true then we can add self-loops corresponding to those states, 
i.e., states in {so|so £ SAVsi :: (so,Si) <j£S p L>S e }. Finally, 
after the fault-tolerant program is obtained, we remove these 
self-loops. We note that this does not affect either soundness 
or completeness of any of our algorithms. 

5.2 Adding Failsafe Fault-Tolerance 

The algorithm for adding failsafe fault-tolerance for k = 2 
is as shown in Algorithm [2] In this algorithm set msi is 
the set of states no matter how they are reached, starting 
from them, there exists a computation suffix which violates 
safety. Set ms 2 is the set of states if they are reached by 
a program or fault transition, starting from them, there ex¬ 
ists a computation suffix which violates safety. Note that 
ms 2 always includes ms\. Initially, msi is initialized to 
{so|(so,si) £ fnS b }, and ms 2 is initialized to msiU{so|3si :: 
(so,si) £ 5 e n <5b} by Lines [T| and [2] Set mt is the set of 
transitions that the final program cannot have, as they are 
in 5b U 5 r , or reach a state in ms 2 - 

In the loop on Lines [41- 1101 more states are added to msi 
and ms 2 . Consequently, mt should be updated. Any state 
so is added to ms i by Line [3 in two cases: 1) if there exists a 
fault transition starting from so that reaches a state in ms 2 
2) if there exists an environment transition (so, si) such that 
(so,si) is a bad transition or si £ ms i, and any transition 
starting from msi reaches a state in ms 2 (i.e., any transition 
(so, S 2 ) £ mt). 

A state is added to ms 2 by Line [8] if it is added to msi or 
if there exists an environment transition to a state in ms 1 . 
We update mt by Line [9] to include transitions to new states 
added to ms 2 -The loop on Lines 141- 1101 terminates if no state 
is added to msi or ms 2 in an iteration. 

Then, we focus on creating new invariant, S', for the re¬ 
vised program. S' cannot include any transition in ms 2 , as 
starting from any state in ms 2 , there is a computation which 
violates safety. In addition, the set of program transitions of 
the revised program, S'p, cannot include any transition in mt, 
as by any transition in mt a state in ms 2 is reached. Thus, 
we initialized S'p with 5 P \S — mt. Note that S' should be 
closed in p'\\ 2 S e . In addition, according to Assumption^] S' 
cannot include any deadlock state. Thus, anytime that we 
remove a state from S' we ensure these condition by calling 
RemoveDeadlock and EnsureClosure functions. 

Note that according to condition Cl, of the addition prob¬ 
lem defined in the Section Or. II the set of computations of the 
revised program inside its invariant should be a subset of set 


of computations of the original program inside its invariant. 
Thus, the revised program cannot have any new computa¬ 
tion starting from its invariant. In loop on Lines Il3l - 1221 we 
remove states from S' to avoid creating such new computa¬ 
tions. 


Algorithm 2 Adding Failsafe Fault-Tolerance 

Input: Sp,5 p ,5e, S,8b, 5 r , k, and / 

Output: (5p, S') or Not-possilbe 
1: ms 1 = {s 0 |(so,si) £ f nS b }; 

2: ms 2 = msi U {so|3si :: (so, s 1 ) £ 8 e (~l 8b}; 

3: mt = {(so, si)| (so, si) £ ( S b U 8 r ) V si £ ms 2 }; 

4: repeat 
5: ms} = msi; 

6: ms' 2 = ms 2 ; 

7: msi = ms 1 U {so| 3si : si £ ms 2 : (so,si) £ /} U 

{so|(3si :: (si £ msi A (so, si) £ S e ) V (so, si) £ S e fl 
5b)) A (Vs 2 :: (so, s 2 ) £ mt)}; 

8 : ms 2 — ms 2 UmsiU{so| 3 si : si £ ms 1 : (so,si) £ 5 e )}; 

9: mt = {(so, si)| (so, si) £ (Sb U S r ) Vs 1 £ ms 2 }; 

10: until (ms) = ms 1 A ms 2 = ms 2 ) 

11: S'p = S P \S — mt; 

12: S' = RemoveDeadlock(S — ms 2 , S' p , S e ); 

13: repeat 

14: if S' = (f> then 

15: return Not-possible; 

16: end if 

17: S" = S'; 

18: S'p = EnsureClosure(5' p , S'); 

19: ms 3 = {so| (3si, S 2 :: (so, si) £ 5 e A (so, S 2 ) £ S p ) A 

(^S3 (so,S3) £ Sp) {, 

20: ms 4 = {so|3si :: (si £ ms 3 A (so, si) £ <5 e )} 

21: S' = RemoveDeadlock(S' — ms4,S p ,S e ) 

22: until ( S' ' = S') 

23: S'p = ( S' p U ((Sp - S') x S p ))) - mt; 

24: return (S' p ,S') ; 

25: RemoveDeadlock(S, 8 V , 5 e ) 

26: repeat 

27: S' = S; 

28: S = S — {so| (Vsi : si £ S : (so, si) ^ (5 P )}; 

29: S = S — {so| 3si :: (so,si) £ 5 e Aso £ 5 Asi ^ 5}; 

30: until (S' = S) 

31: return S; 

32: EnsureClosure(p, S) 

33: return p — {(so, si) :: so £ S A si ^ S}; 


Consider a state so starting from which there exists en¬ 
vironment transition (so,si). In addition there exists pro¬ 
gram transition (so, S 2 ) in the set of program transitions of 
the original program, 8 P . Set mss includes any state like 
so- If so is reached by environment transition (S3, so), in the 
original program according to fairness assumption, (so,si) 
cannot occur. Thus, sequence (s 3 ,so,si) cannot be in any 
computation of p^iSe. However, if we remove program tran¬ 
sition (so, S 2 ) in the revised program, (s 3 ,so,si) can be in 
computation of p'\} 2 S e . Therefore, we should remove any 
state like S3 from the invariant. Set ms 4 includes any state 
like S3. 

After creating invariant S', we add program transitions 
outside it to S' p . Note that outside S', any program tran¬ 
sition which is not in mt is allowed to exists in the final 









program. In Line 1151 the algorithm declare the no solution 
to the addition problem exists, if S' is empty. Otherwise, at 
the end of the algorithm, it returns (S' p ,S') as the solution 
to the addition problem. 

Theorem 2. Algorithm [3 is sound and complete. And, 
its complexity is polynomial. 

For reasons of space, we provide the proofs in Appendix. 

5.3 Adding Masking Fault-Tolerance 

In this section, we present the algorithm for adding mask¬ 
ing fault-tolerance in the presence of unchangeable environ¬ 
ment actions. The intuition behind this algorithm is as fol¬ 
lows: First, we utilize the ideas from adding stabilization. 
Intuitively, in Algorithm [I] we constructed the set R from 
where recovery to invariant ( S ) was possible. In case of sta¬ 
bilization, we wanted to ensure that R includes all states. 
However, for masking, this is not necessary. Also, the algo¬ 
rithm for adding stabilization does not use faults as input. 
Hence, we need to ensure that recovery from R is not pre¬ 
vented by faults. This may require us to prevent the pro¬ 
gram from reaching some states in R. Hence, this process 
needs to be repeated to identify a set R such that both re¬ 
covery to S is provided and faults do not cause the program 
to reach a state outside R. In addition, in masking fault- 
tolerance, like failsafe fault-tolerance, program should refine 
safety of spec even in presence of faults. Thus, the details 
of the Algorithm [3] are as follows: 

In this algorithm, both Algorithm [T] and Algorithm[2]with 
some modification are used in the loop on Lines 131401 First, 
in the loop on Lines 1511151 we build set R which include all 
states from which all computations reach a state in S. In 
addition, all required program transitions are added to 5' p by 
Line m When loop on Lines iBiTfil terminates, we set ms\ 
to -i(R U R P ), because a state in ->(R U R p ) should not be 
reached by any program, fault, or environment transition. 
We also set ms 2 to ->R, as a state in -<R should not be reach 
by any fault or program transition. Then by Lines 1181271 
we expand ms i and ms 2 with the same algorithm in the 
Algorithm [2] In Line 1281 we remove any transition in mt 
from Sp, as any transition in mt reaches a state in ms 2 , and 
a program transition should not reach a state in ms 2 - 

In the loop on Lines 13011391 we remove some states from 
S' to avoid new behavior inside the invariant just like we 
did in Algorithm [2] If any state so is removed from S' in 
Lines m or m we need to repeat the loop on Lines [31401 
because it is possible that a state in R was dependent on so 
to reach S', but so is not in S' anymore. In Line 1321 the 
algorithm declares that there does not exist a solution if S' 
is empty. Otherwise, when loop on Lines I31I40I terminates, 
the algorithm returns (8 ' p , S') as the solution to the addition 
problem. 

Theorem 3. Algorithm [3 is sound and complete. And, 
its complexity is polynomial. 

For reasons of space, we provide the proofs in Appendix. 

Finally, we note that the addition of nonmasking fault- 
tolerance considered [2] is also possible with Algorithm [3] 
In particular, in this case, we need to set 5b to be the empty 
set. In principle, Algorithm [3] could also be used to add sta¬ 
bilization. However, we presented Algorithm Q] separately 
since it is much simpler algorithm and forms the basis of 
Algorithm [3] Moreover, Algorithm [T] can be extended to ar¬ 
bitrary value of k (as done in Algorithm [4] in the Appendix). 
However, the corresponding problem of failsafe and mask¬ 
ing fault-tolerance is open. In particular, Algorithms [2] and 


[3] are sound even if we use an arbitrary value of k. However, 
they are not complete. 

6. EXTENSIONS OF ALGORITHMS 

In this section, we consider problems related to those ad¬ 
dressed in Sections [3] and [5] Our first variation focuses on 
Definition [5] In this definition, we assumed that the envi¬ 
ronment is fair. Specifically, at least k— 1 actions execute 
between any two environment actions. We consider vari¬ 
ations where (1) this property is satisfied eventually. In 
other words, for some initial computation, environment ac¬ 
tions may prevent the program from executing. However, 
eventually, fairness is provided to program actions, and (2) 
program actions are given even reduced fairness. Specifi¬ 
cally, we consider the case where several environment actions 
can execute in a row but program actions execute infinitely 
often. 

Our second variation is related to the invariant of the re¬ 
vised program, S', and the invariant of the original pro¬ 
gram, S. In case of adding stabilization, we considered 
S' = S whereas in case of adding failsafe and masking fault- 
tolerance, we considered S' C S. 

Changes to add stabilization and fault-tolerance 
with eventually fair environment. No changes are 
required to Algorithms [T| or [4] even if environment is even¬ 
tually fair. This is due to the fact that these algorithms 
construct programs that provide recovery from any state, 
i.e., they will provide recovery from the state reached after 
the point when fairness is restored. For Algorithms [2] and 
[3j we should change the input / to include 5 e U 5f. The 
resulting algorithm will ensure that the generated program 
will allow unfair execution of the program in initial states. 
However, appropriate fault-tolerance will be provided when 
the fairness is restored. 

Changes to add stabilization and fault-tolerance 
with multiple consecutive environment actions. If 

environment actions can execute consecutively, we can change 
input 5 e to be its transitive closure. In other words, if (so, si) 
and (si, S2) are transitions in 5 e , we add (so, S2) to 5 e . With 
this change, the constructed program will provide the appro¬ 
priate level of fault-tolerance (stabilizing, failsafe or mask¬ 
ing) even if environment transitions can execute consecu¬ 
tively. 

Changes to add stabilization and fault-tolerance 
based on relation between S' (invariant of the fault- 
tolerant program) and S (invariant of the fault-intolerant 
program) No changes are required to Algorithms Q] or [4] 
even if we change the problem statement to allow S' C S 
without affecting soundness or completeness. Regarding 
soundness, obsrve that the program generated by these algo¬ 
rithms ensure S' = S. Hence, they trivially satisfy S' C S. 
Regarding completeness, the intuition is that if it were im¬ 
possible to recover to states in S then it is impossible to 
recover to states that are a subset of S. Regarding Algo¬ 
rithms [2] and [3] if S' is required to be equal to S then they 
need to be modified as follows: In these algorithms if any 
state S is removed (due to it being in ms 2 , deadlocks, etc.) 
then they should declare failure. 

7. RELATED WORK 

This paper focuses on addition of fault-tolerance proper¬ 
ties in the presence of unchangeable environment actions. 

This problem is an instance of model repair where some 
existing model/program is repaired to add new properties 
such as safety, liveness, fault-tolerance, etc. Model repair 






Algorithm 3 Adding Masking Fault-Tolerance 

1: S' = S; 

2: 5’ p = (5 P \S); 

3: repeat 
4: R = S'; 

5: S" = S’; 

6: repeat 

7: R’ = R; 

8: Rp = {so|so ^ RA3si : si £ R : (so, si) ^ (5f,U<5r-)}; 

9: for each so £ R v do 

10: S' p = S' p U {(so, si)|(s 0 , si) ^ (6b U <5 r ) A si £ 7?}; 

11: end for 

12: for each so ^ R : ^S 2 : S 2 £ ->(-R U 77 p ) : (so, S 2 ) £ 

<5 e A (3si £ R U R p : (so, si) £ <5 e V so £ -R P ) do 

13: R = R U so; 

14: end for 

15: until (R' = R); 

16: msi = -i(R U R P ); 

17: ms2 = — 1 -R; 

18: msi = msi U {so|(so, si) £ / n Jb}; 

19: ms 2 = ms 2 U msi U {so|3si :: (so, si) £ 5 e D <5&}; 

20: mt = {(so, si)| (so, si) £ (5b U 5 r ) V si £ ms 2 }; 

21: repeat 

22: msi = ms 1 ; 

23: ms '2 = ms 2 ‘, 

24: ms 1 = msi U {so| 3si : si £ ms 2 : (so,si) £ /} U 

{so|(3si : si £ msi : (so,si) £ <5 e ) V (so,si) £ 
(6 e n 5b)) A ($S 2 :: (so, s 2 ) £ (6' p - mt))}; 

25: ms 2 = ms 2 U msi U {so|3si : si £ ms 1 : (so, si) £ 

4)}; 

26: mt = {(so, si)| (so, si) £ (5b U 5 r ) Vs 1 £ ms 2 }; 

27: until ms' = ms 1 A ms 2 = ms 2 

28: 5' p = 5p — mt; 

29: S' = RemoveDeadlock(S — ms 2 , 5' p , (5 e ); 

30: repeat 

31: if S' = 4> then 

32: return Not-possible; 

33: end if 

34: S'" = S'; 

35: 5'p = EnsureClosure(5' v , S'); 

36: ms3 = {so | (3si, S 2 :: (so, si) £ <5 e A (so, S 2 ) £ 5 P )A 

(M ■ (s 0 , s 3 ) £ <5 P )}; 

37: ms4 = {so|3si :: (si £ mss A (so, si) £ <5 e )} 

38: S' = RemoveDeadlock(S' — msi, 5 P , 5 e ) 

39: until (S'" = S') 

40: until (S" = S') 

41: return (5' p , S') ; 


with respect to CTL properties was first considered in [7j, 
and abstraction techniques for the same are presented in [10 . 
In [T5] , authors focus on the theory of model repair for mem¬ 
oryless LTL properties in a game-theoretic fashion; i.e., a re¬ 
paired model is obtained by synthesizing a winning strategy 
for a 2-player game. Previously [3], authors have considered 
the problem of model repair for UNITY specifications [9[. 
These results identify complexity results for adding prop¬ 
erties such as invariant properties, leads-to properties etc. 
Repair of probabilistic algorithms has also been considered 
in the literature [22] 

The problem of adding fault-tolerance to an existing pro¬ 
gram has been discussed in the absence of environment ac¬ 
tions. This work includes work on controller synthesis m 
11411211 . A tool for automated addition of fault-tolerance to 
distributed programs is presented in |5]. This work utilizes 
BDD based techniques to enable synthesis of programs with 
state space exceeding IO 100 . However, this work does not 
include the notion of environment actions that cannot be 
removed. Hence, applying it in contexts where some pro¬ 
cesses/components cannot be changed will result in unac¬ 
ceptable solutions. At the same time, we anticipate that the 
BDD-based techniques considered in this work will be es¬ 
pecially valuable to improve the performance of algorithms 
presented in this paper. 

The work on game theory E3E3E3 has focused on the 
problem of repair with 2-player game where the actions of 
the second player are not changed. However, this work does 
not address the issue of fault-tolerance. Also, the role of 
the environment in our work is more general than that in 
ir61H91E01f251 . Specifically, in the work on game theory, it 
is assumed that the players play in an alternating manner. 
By contrast, we consider more general interaction with the 
environment. 

In [6], authors have presented an algorithm for adding 
recovery to component based models. They consider the 
problem where we cannot add to the interface of a physi¬ 
cal component. However, it does not consider the issue of 
unchangeable actions of them considered in this work. 


8. APPLICATION FOR DISTRIBUTED AND 
CYBER-PHYSICAL SYSTEMS 

We considered the problem of model repair for systems 
with unchangeable environment actions. By instantiating 
these environment actions according to the system under 
consideration, this work can be used in several contexts. We 
briefly outline how this work can be used in the context of 
distributed systems and cyber-physical systems. 

One instance of systems with unchangeable actions is dis¬ 
tributed programs consisting of several processes. Consider 
such a collaborative distributed program where some com¬ 
ponents are developed in house and some are third party 
components. It is anticipated that we are not allowed to 
change third party programs during repair. In that case, 
we can model the actions of those processes as unchange¬ 
able environment actions, and use algorithms provided in 
this paper to add stabilization/fault-tolerance. Our work is 
directly useful in high atomicity contexts where processes 
can view the state of all components but can modify only 
their own. In low atomicity contexts where processes have 
private memory that cannot be read by others, we need to 
introduce new restrictions. Specifically, in this context, we 
need to consider the issue of grouping [5] where adding or 
removing a transition requires one to add or remove groups 
of transitions. In particular, if two states so and so differ 











only in terms of private variables of another process then in¬ 
cluding a transition from so requires us to add a transition 
from so. Extending the algorithms in this context is beyond 
the scope of this paper. 

Another instance in this context is a cyber-physical sys¬ 
tem. Intuitively, a CPS consists of computational compo¬ 
nents and physical components. One typical constraint in 
repairing these systems to satisfy new requirements is that 
physical components cannot be modified due to complex¬ 
ity, cost, or their reliance on natural laws about physics, 
chemistry etc. In other words, to repair a CPS model, we 
may not be allowed to add/remove actions which model 
physical aspects of the system. Therefore, using the ap¬ 
proach proposed here, we can model such physical actions 
as unchangeable environment actions. After modeling the 
CPS, we can utilize the algorithms provided in this paper to 
add stabilization/fault-tolerance automatically, and be sure 
that the stabilizing/fault-tolerant models found by the algo¬ 
rithms do not require any change to physical components. 

9. CONCLUSION 

In this paper, we focused on the problem of adding fault- 
tolerance to an existing program which consists of some ac¬ 
tions that are unchangeable. These unchangeable actions 
arise due to interaction with the environment, inability to 
change parts of the existing program, constraints on physical 
components in a cyber-physical system, and so on. 

We presented algorithms for adding stabilization, failsafe 
fault-tolerance, masking fault-tolerance and nonmasking fault- 
tolerance. These algorithms are sound and complete and 
run in polynomial time (in the state space). This was unex¬ 
pected in part because environment actions can play both an 
assistive and disruptive role. The algorithm for adding fail¬ 
safe fault-tolerance was obtained by an extension of previ¬ 
ous algorithm DU that added failsafe fault-tolerance without 
environment actions. However, the algorithms for masking 
and stabilizing fault-tolerance required a significantly differ¬ 
ent approach in the presence of environment actions. 

We considered the cases where (1) all fault-free behav¬ 
iors are preserved in the fault-tolerant program, or (2) only 
a nonempty subset of fault-free behaviors are preserved in 
the fault-tolerant program. We also considered the cases 
where (1) environment actions can execute with any fre¬ 
quency for an initial duration and (2) environment actions 
can execute more frequently than programs. In all these 
cases, we demonstrated that our algorithm can be extended 
while preserving soundness and completeness. Finally, as 
discussed in Section [8] these algorithms are especially useful 
for repairing CPSs as well as repairing distributed systems 
where only a subset of processes are repairable. 
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APPENDIX 

A. PROOFS OF AFGORITHMED 

Based on the notion of fairness for program actions, we 
introduce the notion of whether an environment transition 
can be executed in a given computation prefix. Environment 
action can execute in a computation prefix if an environment 
action exists in the last state of the prefix and either (1) 
program cannot execute in the last state of the prefix or (2) 
the program has already executed k— 1 steps. Thus, 

Definition 18 (environment-enabled). In any pre¬ 
fix a = (so, si, ..., Si) ofp\\k5 e , Si is an environment-enabled 
state iff 

(3s :: (si,s) G S e ) A 
((ft*, s') :: (Si,s') G 5 P ) V 

($j : j > i — k : ( Sj , Sj+i) G < 5 e )) . 

Lemma 1. Every computation of p'\\k5 e that starts from 
a state in R, contains a state in S'. 

Proof. We prove this by induction. 

Base case: R = S. The statement is satisfied trivially. 
Induction step: A state so is added to R in two cases : 

Case 1 ($S 2 : S 2 G -i(R U R p ) : (so,S 2 ) G <5 e ) A (3si : 
Si G (R U R p ) : (so, Si) G S e ) 

Since there is no s 2 in ~<(R U R p ) such that (so, S 2 ) G S e , for 
every (so,si) G 5 e , s 1 is in R U R p . In addition, we know 
that there is at least one si in RuR p such that (so, si) G S e . 
If si is in R p , there is a program transition from si to a state 
in R. As (so,si) G 5 e , because of fairness assumption, the 
program can occur, and reach R. Thus, every computation 
starting from so has a state in R (in the previous iteration). 
Hence, every computation starting from so has a state in S. 

Case 2 $S 2 : S 2 G ->(!? U Rp) : (so, S 2 ) G 5 e A so G R p 
Since there is no S 2 in -1 (R U Rp) such that (so,S 2 ) G 5 e , 
for every (so,si) G 5 e , s 1 is either in R or R p . In addition, 
we know that there is at least one state si in R U R p such 
that (so,si) G 5 P . In any computation of p'\\k5 e starting 
from so if (so,si) G 5 P then si G R. If (so,si) G S e then 
si G EU R p . If si is in R p , there is a program transition 
from si to a state in R. As (so, si) G 5 e , because of fairness 
assumption program can reach R. Thus, every computation 
starting from so has a state in R. Hence, every computation 
starting from so has a state in S. 

□ 

Theorem 4. Algorithm^ is sound. 

Proof. At the beginning of the algorithm S p = 5 P |5 and 
all other transitions added to S p in the rest of the algorithm 
starts outside S, so p'lS = p\S. Finally, the convergence 
condition is satisfied based on Lemma [l] and the fact that R 
includes all states. □ 

Now, we focus on showing that Algorithm [l] is complete, 
i.e., if there is a solution that satisfies the problem statement 
for adding stabilization, Algorithm |T| finds one. The proof 
of completeness is based on the analysis of states that are 
not in R upon termination. 

Observation 2. For any so such that so R, we have 
3 s 2 : S 2 G -i(R U Rp) : (so, s 2 ) G S e , or 


Observation 3. For any so such that so $. R and 3si :: 
(so, si) G 5 e , we have 3 s2 : S 2 G ~>(R U Rp) : (so, S 2 ) £ 5 e . 

Lemma 2. Let 8p be any program such that Sp n 5b = 

<f>. Let Sj be any state in —>(R U R P ). Then, either Sj is 
a deadlock state in Sp U 8 e , or for every p"\\k8 e prefix a = 
(..., Sj-i, Sj), there exists suffix ft = (sj+i, Sj+ 2 , ■ ■ ■), such 
that a/3 is ap"\\k8 e computation, and one of two conditions 
below is correct: 

1. s j+ i £ ffiRURp) 

2. Sj-|-i £ R p — R A Sj.j-2 £ ~ 1 '(R U Rp) 

Proof. There are two cases for Sj: 

Case 1 If Sj is environment-enabled 

Based on the Observation [3] there should exist s" G ~'(R U 
R p ) such that ( Sj,s ") £ 8 e . We set s -,+1 = s". 

Case 2 If Sj is not environment-enabled 
In this case (sj,Sj+i) £ 8 P , and as Sj £ ffiRURp), Sj +1 £ 
-iR. (otherwise Sj would be in R p ). There are two sub-cases 
for this case: 

Case 2.1 s^+i £ -'Rp 
In this case Sj+i £ -1 (R U R p ). 

Case 2.2 Sj+i £ R p 

As .Sj+i G -iR H R p , according to Observation [2] we have 
3s 2 : s 2 G -'(RURp) : (sj +1 ,s 2 ) G S e . As (sj,Sj +1 ) G 8 V , 
even with fairness (sj+i,S 2 ) can occur. Therefore we set 
Sj +2 = S 2 , i.e., Sj +2 G ffiR U Rp). □ 

Corollary 1. Let S p be any program such that 8 P nds = 

<j>. Let Sj be any state in -*(RU R p ). Then for every p" \\kS e 
prefix a = (..., Sj-i, Sj), there exists suffix /3 = (sj+i, Sj+ 2 , • ■ •), 
such that a/3 is a p''\\k8 e computation, and V* : * > j : Si G 
->R (i.e. -<S). 

Theorem 5. Algorithm^ is complete. 

Proof. Algorithm Q] returns Not-possible only when, at 
the end of loop there exists a state so such that so ^ R. 
When so ^ R, according to Observation [5] we have two cases 
as follows: 

Case 1 3s 2 : s 2 G —>(J2 U Rp) : (so, s 2 ) G 8 e 
As there exists an environment action to state s 2 in ->(R U 
R p ), starting from s 0 there is a computation that next step 
is in ->(RU R P ). Note that, when a computation starts from 
so, even with fairness assumption (so,S 2 ) G 8 e can occur. 
Based on Corollary [l] for every Sp such that Sp fl 8b = <f), 
starting from so, there is a computation such that every 
state is in ->R. 

Case 2 $si : si G (R U R p ) : (so, si) G 8 e A so ^ R v 
Based on Corollary [T] starting from so G ~>(RU R p ), there is 
a computation such that every state is in -1 R. Therefor for 
every <5" such that Sp n Sb = <j>, there is a computation start¬ 
ing from so such that all states are outside R (i.e outside s). 
Thus, it it impossible to have any stabilizing revision for the 
program. 

□ 

Theorem 6 . Algorithm^ is polynomial (in the state space 
of P) 


Proof. The proof follows from the fact that each state¬ 
ment in Algorithm [0 is executed in polynomial time and the 
number of iterations are also polynomial, as in each iteration 
at least one state is added to R. □ 

Proof of Theorem [Tj is resulted by Theorem |4j O and [6] 

B. ADDITION OF SAFE STABILIZATION 
FOR ANY K 

In this section, we present a general algorithm for addi¬ 
tion problem defined in Section f3. II Algorithm |T] generates 
a program that is stabilizing when k = 2. Hence, the gen¬ 
erated programs are stabilizing even with a higher k value. 
However, Algorithm [I] will fail to find a program if addition 
of stabilization requires an higher value of k. Algorithm [4] is 
complete for any k > 1. 

In this algorithm, state predicate R is the set of states such 
that every computation starting from them has a state in S. 
At the beginning this is equal to S. In each iteration, the 
value of Rank for each state shows the number of program 
transitions needed to reach a state in R. At the beginning, 
Rank of all states in R is equal to 0 and all the other states 
have Rank equal to 00 . In each iteration, repeat loop on 
Line mm compute smallest Rank possible for each state 
outside R, and change program transitions to reach R using 
minimum number of program transition. 

In for loop on Line ll5ll8l we add new states to R. We add 
a state to R whenever every computation starting from that 
state has a state in S. A state so can be added to R, only 
when there is no environment transition starting form so go¬ 
ing to a state with Rank > k. In addition to this condition, 
there should be one way for so to reach R. Therefore, there 
should be at least one environment transition to a state with 
Rank < k, or Rank of so is 1 which means from so we can 
reach a state in R with only one program transition. Just 
like Algorithm [l] this algorithm terminates if no state can 
be added to R in the last iteration. At the end of the al¬ 
gorithm, if there a state outside R, we declare that there is 
no safe stabilizing revision for the original program. Oth¬ 
erwise, the algorithm returns the set of transitions of the 
revised stabilizing program, 8' p . 

Now, we provide the proofs of Algorithm [4] 

Lemma 3 . Every computation of p'\\kS e that starts from 
a state in R, contains a state in S'. 

Proof. Proof by induction: 

Base case: R = S 

Induction Step: There are two cases that the algorithm 
adds so to R (i.e., set Rank to 0): 

Case 1 ($s 2 : Rank.s 2 > k : (so,S2) G < 5 e )A( 3 si : Rank.si < 
k : (s 0 , si) G S e ) 

In this case any environment transition starting from so 
reaches state with Rank < k. Therefore, with less than 
k transitions, it is possible to reach S. Moreover, at least 
one such transition exists, so reaching S from this state is 
guaranteed. 

Case 2 (^S2 : Rank.s 2 > k : (so, S2) G 8 e ) A (Rank.so = 1) 
In this case there is no environment action from so to a state 
with Rank > k, and program can reach state with Rank = 0 
with one step by a program transition. So, recovery to S is 
guaranteed. □ 

Theorem 7 . Algorithm^ is sound. 





Algorithm 4 Addition of safe stabilization for any k 
Input: S p , S p , 5 e , S, 5b, and k 
Output: Sp or Not-possilbe 

1: K := (Sp\sy, 

2: R = S; 

3: Vs : s £ R : Rank.s = 0; 

4: Vs : s £ R : Rank.s = oo; 

5: repeat 
6: R' = R; 

7: repeat 

8 : Sp = Sp; 

9: if 3so : so (f R : (3si : Rank.si + 1 < Rank.so : 

(so, si) ^ 5b) then 

10: S' = 5' - {(s 0 , s)| (so, s) £ 5(,}; 

11: 4 = 4 u {(so, si)}; 

12: Rank.so = Rank.si + 1; 

13: end if 

14: until (Sp = S' p ) 

15: for each so : so (f R : 

($S 2 : Rank.S 2 > k : (so,S 2 ) € 5 e ) A 

((3si : Rank.si < k : (so,si) £ 5 e ) V (Rank.so = 1)) 

do 

16: Rank.so = 0; 

17: R = R U So; 

18: end for 

19: until (R' = R) 

20: if 3so : so R then 
21: return Not-possible; 

22: else 

23: return S' p ; 

24: end if 


Proof. Proof of this theorem is quite the same as Theo- 
rem[4]just instead of LemmaU we should use Lemma[3] □ 

Observation 4. For any state so if Rank.so > 0 (i.e., 
so ^ R) then 

1. 3s 2 : Rank.S 2 > k : (so,S 2 ) £ 5 e , or 

2. $si : Rank.si < k : (so, si) £ 5 e ) A Rank.s o > 1 

Observation 5. Let p" be any program such that S' p ' D 
5b = 4>■ Let Sj be any state with Rank > 0, and 3s :: 
(sj,s) £ 5 e . Then for every p"[]k5 e prefix a = (...,Sj-i,Sj), 
there exists suffix fi = (sj+i, Sj+ 2 , ■ ■ ■), such that aft is a 
p" [] k5e computation, and Rank.Sj+i > k A (sj,Sj+ i) £ 5 e . 

Observation 6 . Let p" be any program such that 5p D 
5b = <f>. Let Sj be any state with Rank = m, there does not 
exist state s' with Ranks < m — 1 such that (s, s') £ 5". 

Theorem 8 . Let p" be any program such that 5'f, n 5b = 
if>. Let Sj be any state with Rank = m. Then for every 
p"\\k5 e computation (..., Sj-i, Sj, Sj+i ,...) where VI : l > j : 
(si,s/+i) £ 5 P , one of two following conditions is true: 

1. 3 i : i > j : Rank.Si = m — 1. 

2. Vi : i > j : Rank.Si > m. 

Proof. We proof this theorem by contradiction: 

Suppose it is not true. It means there is a p"[]fc<5 e computa¬ 
tion (..., Sj- 1 , Sj, Sj+i,.. .} where Rank.Sj = m and VI : l > 
j : (si,si+ 1 ) £ 5 P , but 

3 k : k > j : (VI : s < l < k : Rank.I > m) A Rank.(k + 1) < 
to — 1. This is contradiction to Observation [6] □ 


Corollary 2. Let p" be any program such that 5” n 
5b = 4>. Let Sj be any state with Rank = to and Sk be 
any state with Rank = 0. Then for every p"\\k5 e prefix 
(...,Sj-i,Sj,Sj+i,... ,Sk) where VI : l > j : (si,si+ i ) £ 5 P , 

3 i : j <i < k : Rank.Si = 1. 

Corollary 3. Let p" be any program such that 5” n 
5b = 4>. Let Sj be any state with Rank = to and Sk be 
any state with Rank = n. Then for every p"\\k5 e prefix 
(,..,Sj-i,Sj,Sj+i,... ,Sk) where VI : l > j : (si,si+ i) £ 5 P , 
k — j > to — n. 

Theorem 9. Let p" be any program such that 5p n 5b = 
<j>. Let Sj be any state with Rank > k. Then for ev- 
ery p"\\k5 e prefix a = (..., Sj-i, Sj), there exists suffix fi = 
{sj+ 1 , Sj+ 2 ,...), such that aft is a p"[\k5 e computation, and 
one two cases below is correct: 

1. Vi : j < i : Rank.si > 0 

2. 3 i : j < i : Rank.si > k A VI : j < l : Rank.si > 0 

Proof. If 3s :: (so,s) £ 5 e , according to Observation [5] 
there should be an environment action to state with Rank > 
k. Therefore, that transition can occur and program reaches 
a state with Rank > k. Otherwise, if there is no environ¬ 
ment transition starting from so, so is either a deadlock 
state or has a program transition in 5". If it is deadlock the 
theorem is proved. Otherwise the same property is true in 
the next state si that is reached by the program action if 
Rank.s i > 1. If there is no deadlock or environment tran¬ 
sition for a sequence of states, then we have a sequence of 
program transitions. If Rank of all states in this sequence is 
greater that 0, the theorem is proved. Otherwise, if there is 
Sk with Rank = 0 according to Corollary [2] we reach state s' 
with Rank = 1. According to Observation [4] from s' there is 
an environment transition (s', s") to a state with Rank > k. 
According to Corollary [3] to reach s' with Rank = 1 from 
Sj with Rank > k at least k — 1 program transitions are 
needed. It means that even with fairness assumption an en¬ 
vironment transitions can occur from s'. Therefore, (s',s") 
can occur and reach s" with Rank > k. 

□ 

Corollary 4. Let 5p be any program such that 5p n5(, = 
<j>. Let Sj be any state with Rank > k. Then for ev¬ 
ery p"[]i,c5 e prefix a = (..., Sj-i, Sj), there exists suffix fi = 
(sj+i, Sj+ 2 , ■ ■ ■}, such that afi is ap'^kSe computation, and 
Vi : i > j : Rank.si > 0. 

Theorem 10. Algorithm^is complete. 

Proof. According to Observation [4] if we have a state so 
with Rank > 0, we have two cases: 

Case 1 3 s 2 : Rank.s 2 > k : (so,S 2 ) £ 5 e 

As there exists an environment action to state s 2 with Rank > 
k, starting from so there is computation that next state has 
Rank > k. Note that, when a computation starts from so, 
even with fairness assumption (so, S 2 ) £ 5 e can occur. Based 
on Corollary [4] starting from so, there is a computation such 
that every state is outside S. 

Case 2 ($si : Rank.si < k : (so, si) £ 5 e ) A (Rank.so > 1) 

If 3s :: (so,s) £ 5 e , according to Observation [S] there should 
be an environment action to state with Rank > k. There¬ 
fore, that transition can occur and program reaches a state 
with Rank > k. Otherwise, if there is no environment tran¬ 
sition starting from so, So is either a deadlock state or has 







a program transition in 5 p . If it is deadlock it means that 
there is no computation to reach S. Otherwise the same 
property is true in the next state, si, that is reached by 
the program action if Rank.si >1. If there is no dead¬ 
lock or environment transition for a sequence of states, then 
we have a sequence of program transitions. If Rank of all 
states in this sequence is greater that 0, it means that there 
is a computation that does not reach S, and the theorem is 
proved. Otherwise, if there is Sk with Rank = 0 according 
to Corollary [2] we reach state s' with Rank = 1. According 
to Observation [4] from s' there is an environment transition 
(s ', s'') to a state with Rank > k. According to Corollary [jj] 
to reach s' with Rank = 1 from so with Rank > k at least 
k — 1 program transitions are needed. It means that even 
with fairness assumption an environment transitions can oc¬ 
cur from s'. Therefore, ( s',s ") can occur and reach s" with 
Rank > k. Then, according to Corollary |T] there is a com¬ 
putation starting from s" (i.e., so) in which all the states 
are outside S. □ 

Theorem 11. Algorithm^is polynomial (in the state space 
of P) 

Proof. The proof follows from the fact that each state¬ 
ment in Algorithm [4] is executed in polynomial time and the 
number of iterations are also polynomial, as in each iteration 
at least one state is added to R. □ 

C. PROOFS FOR ALGORITHM H 

In this section, we prove the soundness ('Theorem 1 1211 and 
completeness ('Theorem 1161) of this algorithm with the help 
of Lemma [4] Corollaries [5] and [6j and Theorem 1151 We also 
prove the complexity result for this algorithm ('Theorem llTI) . 

The following theorem splits condition Cl into easily check¬ 
able conditions that assist in soundness and completeness. 
Specifically, this shows that condition Cl is satisfied iff p' 
does not include any new states or transitions in S'. It also 
ensures that new computations are not created in p' due 
to deadlocks (caused by removal of transitions from p) that 
may be created due to removal of transitions of p. 

Lemma 4. The condition Cl in the problem definition of 
addition of fault-tolerance is satisfied for k = 2 iff conditions 
below are satisfied: 

1. 8p U S e is closed in S' 

2. S' C S 

3. <5;|S' C 8 P \S 

4 ■ Vsi : (3so,S2,S3 : so € S' A (so, si), (si, S 2 ) G 8 e A 
(si, S3) G < 5 P ) : ( 3 s 4 :: (si,s 4 ) G 5 p ) 

5. 3(so,si) :: (so,si) G (5 P U 5 e ) => 3(so,S2) :: (so,S 2 ) G 

(s;u«5e) 

Proof. (=>) If any of the five conditions are violated then 
we can easily create a new computation o/p'[]2<5 e that is not 
a computation o/p[]2<5 e thereby violating Cl. 

(<=) We show by induction that if the five conditions of 
theorem are satisfied, then every prefix a = (so, si,..., Si) 
°f p'W 2^e that starts from a state in S' is a prefix o/p[]2<5 e 
which starts in S. 

As S' C S, we know that so G S, then for every i > 0 we 
have cases below: 

Case 1 (s», s»+i) G 5' p 

Since p' U 8 e is closed in S', we know that Sj+i G S'. As 


is a prefix of p\\ 2 S e , then (so, si,..., Sj, Sj+i) is a prefix of 
P&Se. 

Case 2 (s», s;+i) G 5 e : 

Two sub-cases are possible below this case: 

Case 2.1 (.S,;-1, s t ) G 5 P 

In this case, as we have reached Si by a program transi¬ 
tion, even with fairness assumption, (si,Si+ 1) can occur in 
p02<S e , Therefore, if (so, si,..., s<) is a prefix of p[] 2 S e , then 
(so, si,..., Si, Si-j-i) is a prefix ofpfoSe- 

Case 2.2(si-i,Si) G 5 e 

In this case, there should not be exist s' such that (si,s') G 
5 p (otherwise, because of fairness (s;,s;+1) cannot be in any 
prefix ofp'WiSe). Then, according to condition 3, there should 
not exist state s" such that (si,s") G S p . If not, assumption 
3 of the theorem is violated. As there is no such s", even with 
fairness assumption (si, Sj+i) can occur inp\\ 2 b e . Therefore, 
if (s 0 , s 1 ,..., Si) is a prefix o/p[] 2 5 e , then (s 0 , s 1 ,..., Si, s i+ i) 
is a prefix of p\\ 25 e . 

Case 3 Si is deadlock in 5 P U 5 e 

From condition 5, Si is deadlock in as well, so (so, Si,..., sf) 

is a computation ofp\\ 28 e . 

As every prefix o/p'f^e that start from a state in S' is a 
prefix o/p[]2<5e which starts in S, Cl is satisfied. 

□ 

Theorem 12 . Algorithm^ is sound. 

Proof. To show the soundness of our algorithm, we need 
to show that the three conditions of the addition problem 
are satisfied. 

Cl : Consider a computation c of p'\\25 e that starts from 
a state in S'. By construction, c starts from a state in S, 
and < 5 p|S' is a subset of < 5 P |S'. In addition, 8 P U 8 e is closed 
in S'. Therefore, the first three requirements of Lemma [ 4 ] 
are satisfied. Now, we show the forth and fifth requirements 
of Lemma [ 4 ] are satisfied, as well. 

Regarding fourth requirement of Lemma | 4 j suppose that 
there exists si in S' such that ( 3 so, S2, S3 : so G S'A(so, si), (si, S2) G 
8 e A (si, S3) G 5 P ) but $S4 :: (si,s 4 ) G 5' p . From 3s2,S3 :: 

(si, S2) G 8 e A (si, S3) G 8 P and Js 4 :: (si,s 4 ) G 5' p we can 
conclude that si is in ms 3 . Then, from (so,si) G S e , we 
know so is in ms 4 , which is contradiction as so is in S. 

Finally, the fifth requirement is satisfied based on our ap¬ 
proach for dealing with deadlock states in Assumption [l] 

Hence, Cl holds. 

C2: From Cl, and the assumption that p[]2<5 e refines spec 
from S, p’[] 2<5 e refines spec from S'. 

Let spec be ( Sf,Lv ). Consider prefix c of p'[] 28 e {]f such 
that c starts from a state in S'. If c does not refine Sf 
then there exists a prefix of c, say (so, si,..., s n ), such that 
it has a transition in < 5 ;,. Wlog, let (so,si,--- ,s n ) be the 
smallest such prefix. It follows that (sn-i,s n ) G 5 b, hence, 

(sn— 1 , Sn) G mt. By construction, p' does not contain any 
transition in mt. Thus, (s n -i, s„) is a transition of / or 5 e . 

If it is in / then s n _i G ms 1 (i.e., s„_i G ms 2 ). If it is in 8 e 
then Sn—i G ms 2 - Therefore, in both cases, s n -i G ms 2, and 
(sn- 2 , s n -i) G mt. Again, by construction we know that 
5' p does not contain any transition in mt, so (s„- 2 , s n -i) 
is either in / or 5 e . If it is in / then s n -2 G ms 1 (i.e., 
s n -2 G ms 2 ). If it is in 5 e two cases are possible: 

1 ) (s n —i,s n ) G /. In this case, as stated before, s„_i G 


C 8 P \S‘, we have ( s; , s; + i) G 5 P . Therefore, if (so, si, ..., Si) msi, so s n -2 G ms 2. 


2) (s^-ps™) G S e ■ In this case, all transitions starting 
from Sn- 1 should be in mt. If this is not the case then this 
implies that there exists a state s such that (s„- 1 , s) is not 
in mt and we would have added it to 5 P by Line 1231 

Since all transitions from s n -i are in mt, s n - 1 is in msi 
(by LineQ. Hence, s „-2 is in ms 2 - 

Continuing this argument further leads to the conclusion 
that so G ms 2 - This is a contradiction. Thus, any prefix of 
p'lhSeWf refines Sf. Thus, C2 holds. 

C3: Any (so,si) G 5 r , is in mt. By construction, 5' p does 
not have any transition in mt. Hence, C3 holds. □ 

Now, we focus on showing that Algorithm [2] is complete, 

i.e., if there is a solution that satisfies the problem statement 
for adding failsafe fault-tolerance, Algorithm [2] finds one. 
The proof of completeness is based on the analysis of states 
that were removed from S. 

Observation 7. For every state so in ms 2 one of three 
cases below is true: 

1. so G ms 1 

2. 3si :: (so, si) G 5 e A si G ms 1 

3. 3si :: (so, si) G 5 e n 5b 

Theorem 13. Let p" be any program that solves the prob¬ 
lem of adding failsafe fault-tolerance. For every prefix a = 
{..., Sj-i, Sj) of p"[] 2 S e where Sj G ms 2 and (sj-i,Sj) G 
/ U S p , there exists a suffix ft = (sj+ 1 , Sj+ 2 , ■ ■ .) such that 
a/3 is a computation of p" []2<5 e and Si : i > j : (st G 
ms 1 ) V ((Si, Si+i) G 5 b ). 

Proof. According to Observation 0 Sj is either in msi, 
can reach a state in ms 1 by an environment action, or has an 
environment action in <56. If Sj G ms 1 , theorem is proved. If 
Sj is not in ms 1 , there exists an environment action e which 
is either in 5 b, or reaches a state in ms 1. As we have reach 
Si by a transition in 5 p U /, even with fairness assumption, 
e can be executed. Thus, either a transition irnJb occur or a 
state in ms 1 is reached. □ 

Theorem 14. Letp" be any program that solves the prob¬ 
lem of adding failsafe fault-tolerance. For every prefix a = 
{..., Sj- 1 , Sj) ofp"\\25 e where Sj G ms 1 , there exists a suffix 
p = (sj+ 1 , Sj+ 2 ,...) such that a/3 is a computation ofp''\\25 e 
and Si : i > j : (st, Si+i) G 5b- 

Proof. We prove this inductively based on when states 
are added to ms 1 

Base case ms 1 = {so|(so, si) G f C I 5b} 

Since fault transitions can execute in any state, the theo¬ 
rem is satisfied by construction. 

Induction step A state so is added into ms 1 in three cases: 
Case 1 3si : si G ms 2 : (so, si) G / 

In this case according to Theorem 1131 a transition in 5b may 
occur, or a state in ms 1 can be reached. Hence, according 
to induction hypothesis a transition in 5b can occur in both 
cases. 

Case 2 3 st :: (si G ms 1 A (so,Si) G <5 e ) A (VS 2 :: (so,S 2 ) G 
mt) 

In this case, if according to fairness (so, si) can occur, state 
si G ms 1 can be reached by (so, si), and according to induc¬ 
tion hypothesis safety may be violated. However, if (so,si) 
cannot occur, some other transition in 5 p U / should occur, 
but we know such transition should be in mt and reaches 
a state in ms 2 - Thus, according to Theorem 1131 either a 


transition in 5 b can occur, or a state in msi can be reached. 
Hence, according to induction hypothesis a transition in 5 b 
can occur in both cases. 

Case 3 ((so, si) G 5 e (3 5 b ) A (VS 2 :: (so, S 2 ) G mt) 

In this case, if according to fairness (so,si) can occur, by 
its occurrence safety is violated. However, if (so,si) cannot 
occur, some other transition in 5" U / should occur, but we 
know such transition should be in mt and reaches a state in 
ms 2 - Thus, according to Theorem 1131 cither a transition in 
5 b can occur, or a state in ms 1 can be reached. Hence, ac¬ 
cording to induction hypothesis a transition in 5 b can occur 
in both cases. □ 

According to Theorem [13] and Therorem [14] we have the 
following two corollaries. 

Corollary 5. Let p" be any program that solves the prob¬ 
lem of adding failsafe fault-tolerance and let S" be its invari¬ 
ant. Then, S" (3 ms 2 = cf- 

Corollary 6 . Let p" be any program that solves the prob¬ 
lem of adding failsafe fault-tolerance and let S" be its invari¬ 
ant. p"\S" cannot have any transition in mt. 

Theorem 15. Let p" be any program that solves the prob¬ 
lem of adding failsafe fault-tolerance, and let S" be its in¬ 
variant. Then S" cannot include any state in set ms 4 in 
any iteration of loop on Lines \13\ - 1221 

Proof. We show that if so is in S", and so is in ms 4 in 
any iteration of loop on Lines [T31 - I22I then there is sequence 
a = (so, s 1 , • • •) such that cr is p"[]2<5 e computation, but it is 
not a p[] 2 ^ e computation. 

Suppose so is in ms 4 in the first iteration of the loop, 
then there is a state, s G ms 3 such that (so, s) G 5 e . Thus, 
( 3 si,s 2 :: (s,si) G 5 e A (s,s 2 ) G 5 P )A (Js 3 :: (s,s 3 ) G 5 p ). 
Sine p" solves the problem, S" is closed in p"\\25 e . Hence, 
s is in S" , as well. In the first iteration 5P\S" C 5 P , because 
according to Corollary [5] and Corollary [ 6 ] 5 p cannot have a 
transition in mt, and S — ms 2 should be closed in 5 p U 5 e . 
Therefore, ( 3 si,S 2 :: (s,si) G <5 e A (s, s 2 ) G 5 P )A (^s 3 :: 
(s, S 3 ) G 5 p ). 

Now, observe that (so, s, si, • ■ •} is p"[] 2 < 5 e computation, 
but it is not a p[] 2 ^e computation, as because of fairness 
(s,si) cannot occur when there exist (s,S 2 ) G 5 P . Since p" 
solves the addition problem, it cannot have any state in ms 4 
in the first iteration. Therefore, as S" should be closed in 
p"[] 2 <Ie, all transition to states in 771 S 4 in the first iteration 
should be removed from 5 p . Thus, 5 p IS” C 5 p in the second 
iteration as well, and with same argument we can show that 
if so G S" is in ms 4 in any iteration, then there is sequence 
o = (so, si, ■ ■ ■ } such that a is p"\\25e computation, but it 
is not a p[] 2 computation. □ 

Theorem 16. Algorithm^ is complete. 

Proof. Let program p" and predicate 5 " solve transfor¬ 
mation problem. S" should satisfy following requirements: 

1. S" (~l ms2 = 4> 

2. S" does not include any state in set ms 4 in any iteration 
of loop on Lines [13] - [52] 

3. Js'o : so G S" : (3si : si ^ S" : (so, si) G 5 e ) 

The first requirement is according to Corollary [5] The 
second requirement is according to Theorem 1151 and the 
third requirement is according to the fact that S" should be 
closed in p"[] 2 <Ie. 


In addtion, according to Corollary[6] 8 p \S" C 5 p —mt. Fi¬ 
nally, according to Assumption [l] all ocmputations of p[]<5 e 
that start in S are infinite. Hence, by condition Cl, all 
computations of <5^ [] 2 <5 e that start from a state in S' must 
be infinite. Our algorithm declares that no solution for the 
addition problem exists only when there is no subset of S 
satisfying three requirements above such there all computa¬ 
tion of (<5 P — mt)[] 2 S e within that subset are infinite. 

□ 

Theorem 17. Algorithm\^is polynomial (in the state space 
of P) 

Proof. The proof follows from the fact that each state¬ 
ment in Algorithm [2] is executed in polynomial time and the 
number of iterations are also polynomial. □ 

Proof of Theorem [2] is resulted from Theorem 1121 1161 and 

El 

D. PROOFS FOR ALGORITHM [3] 

In this section, we prove the soundness, completeness, and 
complexity result of Algorithm [3j 

Lemma 5. In all computations (so,si,...) of p , []2<5 e []/ 
where so G S', there dose not exist Si such that Si is in 
msi in some iteration of loop on Lines [3 [7d[ 

Proof. Consider acomputation (so, si,...} of p'^Se where 
so G S', and there exists Si such that s; is in msi some it¬ 
eration of loop on Lines I3l l40l It follows that (sj_i,Si) in 
in mt. By construction, p' does not contain any transition 
in mt. Thus, (si_i,s„) is a transition of / or 5 e . If it is 
in / then Si_i £ msi (i.e., s^_i G msf). If it is in 5 e then 
Si_i £ ms 2 - Therefore, in both cases, s n -i £ ms 2 , and 
(sn— 2 , s„_i) £ mt. Again, by construction we know that 
Sp does not contain any transition in mt, so (s„_ 2 , Sn-i) 
is either in / or 8 e . If it is in / then s n -2 £ msi (i.e., 
Sn -2 £ ms 2 ). If it is in 5 e two cases are possible: 

1) (s„-i,s„) £ /. In this case, as stated before, s„-i £ 
ms 1 , so Sn- 2 £ ms 2 - 

2) (sn— 1 , Sn) £ 8 e . In this case, as both (si_ 2 ,Si_i) and 
(s;_i,Si) are in 5 e , according to fairness assumption, there 
does not exist a transition 5' p — mt starting from Sj_i, and it 
means that Si -1 is added to ms\ by line 1241 so Si -2 £ rns20 

Continuing this argument further leads to the conclusion 
that so G ms 2 - This is a contradiction. Thus, In all com¬ 
putations (so, Si,...) of p'fcSe where so G S', there dose 
not exist Si such that Si is in ms 1 some iteration of loop on 
Lines 1311401 

□ 

Lemma 6 . In every computation (so,si,...) o/p'[]fc5 e []/ 
that starts from a state in S', if there exists state Si in R p , 
then (si_i, Si) G 8 e . 

Proof. Every state in R p is in ms 2 , and every transi¬ 
tion to a state in ms 2 is in mt in some iteration of loop on 
Lines l3l40l By construction, p' does not contain any transi¬ 
tion which is in mt in some iteration of loop on Lines 131401 
Thus, (si_i, Si) ^ 5'p. 

In addition (si_i, Si) cannot be in /, because if (si_i, Si) G 
/ then Si -1 G msi which according to Lemma[5]is impossi¬ 
ble. □ 

2 Note that, as s n -i G ms 2 it is not in S' . Thus, no transition 
of 5'p starting from Si_i is removed in RemoveDeadlock or 
EnsureClosure functions. 


Lemma 7. For every computation (so, si, ...) of 5 P Wk5 e \\f 
that starts from a state in S' we have: (3i : i > 0 : Si € 
{R U R p ) - S') => (3 j : j > i : Sj G S'). 

Proof. There are two cases: 

Case 1 Si G U 

As Si G R, according to Lemma [T] 3 j : j > i : Sj G S'. 

Case 2si G R P 

As Si is in R p , there is a program transition from Si to a state 
s in R. As so G S', according to LemmaH (si_i,Si) G 8 e , 
and because of fairness assumption program can reach R 
using (si,s). □ 

Lemma 8. RU R p is a f-span for p'\\25 e from S'. 

Proof. By construction, we know that S' C R, thus 
S' =>■ (RUR P ). Any state in -i(RUR p ) is in ms 1 in some it¬ 
eration of loop on Lines I3ll40l According to Lemma O there 
is no computations of p'[]25 e \}f where so G S' such that Si 
is in ms 1 in some iteration of loop on Lines I3H171 Therefore, 
R U R p is a /-span for p'[]2<5 e . 

□ 

Theorem 18. Algorithm^ is sound. 

Proof. In order to show the soundness of our algorithm, 
we need to show that the three conditions of the problem 
statement are satisfied. 

Cl : Satisfaction of Cl for Algorithm [3] is the same as 
that for Algorithm [2] stated in the proof of the Theorem El 

C2 : From Cl and the assumption that p[]2<5 e refines spec 
from S, p'[} 28 e refines spec from S'. 

Let spec = {Sf,Lv). Consider prefix c of p'\\k8 e [\f such 
that c starts from a state in S'. If c does not refine Sf, 
there exists a prefix of c, say (so, si,..., s n ), such that it 
has a transition in 8b- Wlog, let (so,Si,--- , s n ) be the 
smallest such prefix. It follows that (sn-i,s,i) G 5b- Hence, 
(s„_i,s„) G mt. By construction, p' does not contain any 
transition in mt. Thus, (s n -i,Sn) is a transition of / or 
5 e . If it is in / then s„_i G ms\ which it is a contra¬ 
diction to Lemma [5] If it is in 5 e then s n - 1 G ms 2 , and 
(s„_ 2 , s n -i) G mt. Again, by construction we know that 
5'p does not contain any transition in mt, so (s n - 2 ,Sn-i) is 
either in / or 5 e . If it is in / then s „~2 G ms 1 (contradic¬ 
tion to Lemma If it is in 5 e , as both (s n - 2 , s n -i) and 
(sn-i, Sn) are in 8 e , according to fairness assumption, there 
should does not exist a transition of 5' p — mt starting from 
Sn— 1 , and it means that s n -1 G ms 1 , which is again a con¬ 
tradiction to Lemma [5] Thus, each prefix of c does not have 
a transition in 8b- Therefore, any prefix of T/Qfc^efl/ refines 

Sf- 

As p'\\25 e refines spec from S', any prefix of p'O^eO/ 
refines Sf, and according to Lemma [7] and Lemma [8] p 
is masking 2-/-tolerant to spec from S' in environment 5 e . 

C3: Any (so, si) G 5 r , is in mt. By construction, p does 
not have any transition in mt, so C3 holds. 

□ 

Observation 8. In each iteration of loop on Lines {Sj fJOl 
there are two cases for any state so G ms 1 : 

1. s 0 G -<(RUR P ) 

2. so is added by Lines[W\21\ 









Theorem 19. Letp" be any program that solves the prob¬ 
lem of adding masking fault-tolerance and let S" be its in¬ 
variant. Then, S" does not include any state in the set ms 2 
in any iteration of loop on Line EEJ 

Proof. The proof of this theorem is based on Obser¬ 
vation E extension of Theorem ED and Theorem ED from 
failsafe fault-tolerance to masking fault-tolerance, and ex¬ 
tension of Corollary E from stabilizaton to masking fault- 
tolerance. 

□ 

Corollary 7. Let p" be any program that solves the prob¬ 
lem of adding masking fault-tolerance and let S" be its in¬ 
variant. p"\S" cannot have any transition in mt in any 
iteration of loop on Lines HO 

Theorem 20. Let p" be any program that solves the prob¬ 
lem of adding masking fault-tolerance, and let S" be its in¬ 
variant. Then S" cannot include any state in set ms 4 in 
any iteration of loop on Lines YW^39l 

Proof. The proof of this theorem is the same as that of 
Theorem HU □ 

Theorem 21. Algorithm^ is complete. 

Proof. Let program p" and predicate S" solve transfor¬ 
mation problem. S" should satisfy following requirements: 

1. S" does not include any state in set ms 2 in any iteration 
of loop on Lines 131401 

2. S" does not include any state in set ms a in any iteration 
of loop on Lines 1301391 

3. $s 0 ■ so € S" : (3si : si ^ S" : (so, si) € S e ) 

The first requirement is according to Theorem 1191 The 
second requirement is according to Theorem 1201 and the 
third requirement is according to the fact that S" should be 
closed in p"\\ 28 e - 

In addtion, according to Corollary [7] 8 p\S" cannot have 
any transition in mt in any iteration of loop on Lines [3j 
1401 Finally, according to AssumptionE all ocmputations of 
p[]<5 e that start in S are infinite. Hence, by condition Cl, all 
computations of SpfoSe that start from a state in S' must 
be infinite. Our algorithm declares that no solution for the 
addition problem exists only when there is no subset of S 
satisfying three requirements above such there all computa¬ 
tion of ( S p — mt)[] 2 S e within that subset are infinite. 

□ 

Theorem 22. Algorithm\3\is polynomial (in the state space 

of P) 

Proof. The proof follows from the fact that each state¬ 
ment in Algorithm Els executed in polynomial time and the 
number of iterations are also polynomial. □ 

Proof of Theorem E is resulted from Theorem 1181 1211 and 

E2 





